Semi-automatic Data Extraction from Tables

نویسندگان

  • Nikita Astrakhantsev
  • Denis Turdakov
  • Natalia Vassilieva
چکیده

This paper describes a novel approach to automate extraction of useful information from tables and to record the knowledge procured in a structured data repository. The approach is based on modeling a behavior of an expert, who collects tabular data and maps them to a predefined relational schema. Experimental results demonstrate that the proposed approach predicts expert decisions with high accuracy and thus significantly minimizes the time required of an expert for data aggregation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic hidden-web table interpretation, conceptualization, and semantic annotation

The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction, semantic annotation, and semi-structured data management. In this paper, we offer a solution for the common special case in w...

متن کامل

Automatic Hidden-Web Table Interpretation by Sibling Page Comparison

The longstanding problem of automatic table interpretation still illudes us. Its solution would not only be an aid to table processing applications such as large volume table conversion, but would also be an aid in solving related problems such as information extraction and semi-structured data management. In this paper, we offer a conceptual modeling solution for the common special case in whi...

متن کامل

Semi-automatic Extraction of Cross-Table Data from a Set of Spreadsheets

Spreadsheets are widely used in companies. End-users often value the high degree of flexibility and freedom spreadsheets provide. However, these features lead to the development of a variety of data forms inside spreadsheets. A cross-table is one of these forms of data. A cross-table is defined as a rectangular form of data, which expresses the relations between a set of objects and a set of at...

متن کامل

Automatic Extraction of Semi-structured Web Data

As a huge data source the internet contains a large number of valuable information, and the data of information is usually in the form of semi-structured in HTML web pages. In order to extract the web data and organize the data with the relationships which are similar to the real world, this paper has proposed a method for automatic data extraction from the web. With the combination of keywords...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013